Tissue was from the olfactory epithelium of a 10-day old mouse. 10X Chromium version 2 was used.
Read 1 contained the cell barcode (C) and the unique molecular identifier (UMI) (U). There is a 16-nucleotide cell barcode and a 10-nucleotide UMI, situated as such:
CCCCCCCCCCCCCCCCUUUUUUUUUU
Original concactenated FASTQ reads have 27 nucleotides in the sequence portion of Read 1. These reads need to have the last nucleotide removed before downstream analysis can begin.
#!/bin/bash
#SBATCH --partition=bgmp
#SBATCH --cpus-per-task=1
#SBATCH --account=bgmp
/usr/bin/time -v zcat L35291_S1_L001_R1_001.fastq.gz | sed '2~4s/[ATCGN]$//' | sed '4~4s/[!-J]$//' > L35291_Trimmed_S1_L001_R1_001.fastq
/usr/bin/time -v gzip L35291_Trimmed_S1_L001_R1_001.fastq
/usr/bin/time -v mv L35291_Trimmed_S1_L001_R1_001.fastq.gz TrimmedFull/
Takes the FASTQ files and aligns, filters, counts barcodes and UMIs. Outputs the feature-barcode matrices for downstream analysis (in Seurat).
#!/usr/bin/env bash
#SBATCH --job-name=cellrangerV4_count_%j
#SBATCH --account=bgmp
#SBATCH --partition=bgmp
#SBATCH --output=cellrangerV4_count_%j.out
#SBATCH --error=cellrangerV4_count_%j.err
#SBATCH --time=1-23:59:59
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
conda activate bgmp_py37
dir="/projects/bgmp/shared/groups/2020/neuron_nerds"
/usr/bin/time -v \
$dir/CellRanger/cellranger-4.0.0/bin/cellranger count \
--id=Full_trimmed_count \
--fastqs=$dir/full_data/TrimmedFull \
--sample=L35291_Trimmed \
--transcriptome=$dir/refdata-gex-mm10-2020-A \
--localcores=8 \
--localmem=64
Summary from CellRanger counts output:
Download CellRanger_counts_summary64.html